{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Neptune Analytics as Hybrid Graph/Vector Store\n",
    "\n",
    "Cognee.ai supports Neptune Analytics as a hybrid adaptor: providing both graph and vector storage. This allows cognee to use the same storage medium for graph-based queries and vector-similarity searches.\n",
    "\n",
    "In this notebook, we demonstrate how to connect to an Amazon Neptune Analytics instance using the Cognee.ai configuration, which uses AWS Langchain and boto3 under the hood to connect to the AWS service.\n",
    "\n",
    "Apart from the general installation of Cognee.ai, you will need an Amazon Neptune Analytics instance running with access.\n",
    "\n",
    "References:\n",
    "- [What is Neptune Analytics](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/what-is-neptune-analytics.html)\n",
    "- [Vector Similarity using Neptune Analytics](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/vector-similarity.html)\n",
    "- [Amazon CLI credentials and configuration](https://docs.aws.amazon.com/cli/v1/userguide/cli-chap-configure.html#configure-precedence)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Prerequisites\n",
    "\n",
    "## 1. Amazon Neptune Analytics Instance Setup\n",
    "\n",
    "Create an Amazon Neptune Analytics instance in your AWS account following the [AWS documentation](https://docs.aws.amazon.com/neptune-analytics/latest/userguide/get-started.html). Please note you will also need to configuration the following:\n",
    "- Under `Network and Security` | `enable public connectivity`, allow your graph to be reachable over the internet if accessing from outside a VPC.\n",
    "- Under `Vector search settings` | `Vector search dimension configuration` | `Use vector dimension`. The Neptune Analytics instance must be created using the same vector dimensions as the embedding model creates. See: https://docs.aws.amazon.com/neptune-analytics/latest/userguide/vector-index.html. For example, if using OpenAI LLM `openai/text-embedding-3-small`, which uses 1536-dimension embeddings, your Neptune Analytics vector store must be configured to accept 1536-dimension vectors.\n",
    "- Once the Amazon Neptune Analytics instance is available, you will need the graph-identifier to connect.\n",
    "\n",
    "## 2. Attach Credentials\n",
    "\n",
    "Configure your AWS credentials with access to your Amazon Neptune Analytics resources by following the [Configuration and credentials precedence](https://docs.aws.amazon.com/cli/v1/userguide/cli-chap-configure.html#configure-precedence). You can do this by declaring environment variables in your `.env` file in the project root directory and importing dotenv.\n",
    "\n",
    "```\n",
    "export AWS_ACCESS_KEY_ID=your-access-key\n",
    "export AWS_SECRET_ACCESS_KEY=your-secret-key\n",
    "export AWS_SESSION_TOKEN=your-session-token\n",
    "export AWS_DEFAULT_REGION=your-region\n",
    "\n",
    "# this is the NA graph identifier\n",
    "export AWS_NEPTUNE_ANALYTICS_GRAPH_ID=g-your-graph\n",
    "```\n",
    "\n",
    "The IAM user or role making the request must have a policy attached that allows one of the following IAM actions in that neptune-graph:\n",
    "```\n",
    "neptune-graph:ReadDataViaQuery\n",
    "neptune-graph:WriteDataViaQuery\n",
    "neptune-graph:DeleteDataViaQuery\n",
    "```\n",
    "\n",
    "## 3. Configure Cognee.ai\n",
    "\n",
    "To connect to Amazon Neptune Analytics, you need to add the \"neptune_analytics\" provider and graph endpoint url to your graph and vector configuration.\n",
    "\n",
    "```python\n",
    "import os\n",
    "import cognee\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "# load environment variables from .env\n",
    "load_dotenv()\n",
    "\n",
    "graph_identifier = os.getenv('AWS_NEPTUNE_ANALYTICS_GRAPH_ID', \"\") # graph with 1536 dimensions for vector search\n",
    "\n",
    "# Configure Neptune Analytics as the graph & vector database provider\n",
    "cognee.config.set_graph_db_config(\n",
    "    {\n",
    "        \"graph_database_provider\": \"neptune_analytics\",  # Specify Neptune Analytics as provider\n",
    "        \"graph_database_url\": f\"neptune-graph://{graph_identifier}\",  # Neptune Analytics endpoint with the format neptune-graph://<GRAPH_ID>\n",
    "    }\n",
    ")\n",
    "cognee.config.set_vector_db_config(\n",
    "    {\n",
    "        \"vector_db_provider\": \"neptune_analytics\",  # Specify Neptune Analytics as provider\n",
    "        \"vector_db_url\": f\"neptune-graph://{graph_identifier}\",  # Neptune Analytics endpoint with the format neptune-graph://<GRAPH_ID>\n",
    "    }\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import pathlib\n",
    "from cognee import config, add, cognify, search, SearchType, prune, visualize_graph\n",
    "from dotenv import load_dotenv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configuration\n",
    "\n",
    "Do all the imports and configure the graph and vector providers.\n",
    "Uses the default openai llm, so make sure you have an openai api key configured or configure another llm."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load environment variables from file .env\n",
    "load_dotenv()\n",
    "\n",
    "current_directory = os.getcwd()\n",
    "\n",
    "data_directory_path = str(\n",
    "    pathlib.Path(\n",
    "        os.path.join(pathlib.Path(current_directory), \".data_storage\")\n",
    "    ).resolve()\n",
    ")\n",
    "# Set up the data directory. Cognee will store files here.\n",
    "config.data_root_directory(data_directory_path)\n",
    "\n",
    "cognee_directory_path = str(\n",
    "    pathlib.Path(\n",
    "        os.path.join(pathlib.Path(current_directory), \".cognee_system\")\n",
    "    ).resolve()\n",
    ")\n",
    "# Set up the Cognee system directory. Cognee will store system files and databases here.\n",
    "config.system_root_directory(cognee_directory_path)\n",
    "\n",
    "# Set up Amazon credentials in .env file and get the values from environment variables\n",
    "graph_identifier = os.getenv('AWS_NEPTUNE_ANALYTICS_GRAPH_ID', \"\")\n",
    "\n",
    "# Configure Neptune Analytics as the graph & vector database provider\n",
    "config.set_graph_db_config(\n",
    "    {\n",
    "        \"graph_database_provider\": \"neptune_analytics\",  # Specify Neptune Analytics as provider\n",
    "        \"graph_database_url\": f\"neptune-graph://{graph_identifier}\",  # Neptune Analytics endpoint with the format neptune-graph://<GRAPH_ID>\n",
    "    }\n",
    ")\n",
    "config.set_vector_db_config(\n",
    "    {\n",
    "        \"vector_db_provider\": \"neptune_analytics\",  # Specify Neptune Analytics as provider\n",
    "        \"vector_db_url\": f\"neptune-graph://{graph_identifier}\",  # Neptune Analytics endpoint with the format neptune-graph://<GRAPH_ID>\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Clean up environment\n",
    "\n",
    "Prune existing data in the graph store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prune data and system metadata before running, only if we want \"fresh\" state.\n",
    "await prune.prune_data()\n",
    "await prune.prune_system(metadata=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup data and cognify\n",
    "\n",
    "Create a dataset containing Neptune descriptions.  The"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add sample text to the dataset\n",
    "sample_text_1 = \"\"\"Neptune Analytics is a memory-optimized graph database engine for analytics. With Neptune\n",
    "    Analytics, you can get insights and find trends by processing large amounts of graph data in seconds. To analyze\n",
    "    graph data quickly and easily, Neptune Analytics stores large graph datasets in memory. It supports a library of\n",
    "    optimized graph analytic algorithms, low-latency graph queries, and vector search capabilities within graph\n",
    "    traversals.\n",
    "    \"\"\"\n",
    "\n",
    "sample_text_2 = \"\"\"Neptune Analytics is an ideal choice for investigatory, exploratory, or data-science workloads\n",
    "    that require fast iteration for data, analytical and algorithmic processing, or vector search on graph data. It\n",
    "    complements Amazon Neptune Database, a popular managed graph database. To perform intensive analysis, you can load\n",
    "    the data from a Neptune Database graph or snapshot into Neptune Analytics. You can also load graph data that's\n",
    "    stored in Amazon S3.\n",
    "    \"\"\"\n",
    "\n",
    "# Create a dataset\n",
    "dataset_name = \"neptune_descriptions\"\n",
    "\n",
    "# Add the text data to Cognee.\n",
    "await add([sample_text_1, sample_text_2], dataset_name)\n",
    "\n",
    "# Cognify the text data.\n",
    "await cognify([dataset_name])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Graph Memory visualization\n",
    "\n",
    "Initialize Neptune as a Graph Memory store and save to .artefacts/graph_visualization.html\n",
    "\n",
    "![visualization](./neptune_analytics_demo.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get a graphistry url (Register for a free account at https://www.graphistry.com)\n",
    "# url = await render_graph()\n",
    "# print(f\"Graphistry URL: {url}\")\n",
    "\n",
    "# Or use our simple graph preview\n",
    "graph_file_path = str(\n",
    "    pathlib.Path(\n",
    "        os.path.join(pathlib.Path(current_directory), \".artifacts/graph_visualization.html\")\n",
    "    ).resolve()\n",
    ")\n",
    "await visualize_graph(graph_file_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SEARCH: Graph Completion\n",
    "\n",
    "Search using the query \"What is Neptune Analytics?\" and return the graph completion with nodes/edges related to the query."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Completion query that uses graph data to form context.\n",
    "graph_completion = await search(query_text=\"What is Neptune Analytics?\", query_type=SearchType.GRAPH_COMPLETION)\n",
    "print(\"\\nGraph completion result is:\")\n",
    "print(graph_completion)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SEARCH: RAG Completion\n",
    "\n",
    "Search using the query \"What is Neptune Analytics?\" and return a LLM-based completion searches of edges/nodes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Completion query that uses document chunks to form context.\n",
    "rag_completion = await search(query_text=\"What is Neptune Analytics?\", query_type=SearchType.RAG_COMPLETION)\n",
    "print(\"\\nRAG Completion result is:\")\n",
    "print(rag_completion)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SEARCH: Graph Insights\n",
    "\n",
    "Search for insight relationshipts related to \"Neptune Analytics\" as a context."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Search graph insights\n",
    "insights_results = await search(query_text=\"Neptune Analytics\", query_type=SearchType.GRAPH_COMPLETION)\n",
    "print(\"\\nInsights about Neptune Analytics:\")\n",
    "for result in insights_results:\n",
    "    src_node = result[0].get(\"name\", result[0][\"type\"])\n",
    "    tgt_node = result[2].get(\"name\", result[2][\"type\"])\n",
    "    relationship = result[1].get(\"relationship_name\", \"__relationship__\")\n",
    "    print(f\"- {src_node} -[{relationship}]-> {tgt_node}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SEARCH: Entity Summaries\n",
    "\n",
    "Search for summary nodes related to \"Neptune Analytics\" as a context."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Query all summaries related to query.\n",
    "summaries = await search(query_text=\"Neptune Analytics\", query_type=SearchType.SUMMARIES)\n",
    "print(\"\\nSummary results are:\")\n",
    "for summary in summaries:\n",
    "    type = summary[\"type\"]\n",
    "    text = summary[\"text\"]\n",
    "    print(f\"- {type}: {text}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SEARCH: Chunks\n",
    "\n",
    "Search for chuck nodes related to \"Neptune Analytics\" as a context."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "chunks = await search(query_text=\"Neptune Analytics\", query_type=SearchType.CHUNKS)\n",
    "print(\"\\nChunk results are:\")\n",
    "for chunk in chunks:\n",
    "    type = chunk[\"type\"]\n",
    "    text = chunk[\"text\"]\n",
    "    print(f\"- {type}: {text}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
